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BACKGROUND 


The  definition  for  distributive  computing  has  many  variations.  Some  of 
the  more  detailed  definitions  can  be  found  in  Weitzman  (1980)  and  Sharp  (1987). 
For  the  context  of  this  report  the  definitions  below  will  suffice; 

Tightly  Coupled  Distributed  System  (TCDS).  TCDS  is  a  system  in 
which  several  processors  have  access  to  a  common  block  of  memory.  There  are  a 
number  of  large  mainframes  of  this  type  (Univac  1100  series  is  one  example). 
Each  processor  has  its  own  address  hardware  for  getting  data  and  an  overall  pri¬ 
ority  scheme  for  all  the  processors  to  access  that  memory. 

Loosely  Coupled  Distributed  System  (LCDS).  LCDS  is  a  system  in 
which  no  common  memory  and  data  are  shared  by  a  network.  Each  processor 
must  have  a  protocol  facility  for  the  network  to  access  data/message  packets  on 
the  network.  Both  Apollo  and  Sun  market  ^tems  of  this  type. 

This  definition  is  generalized.  A  more  exact  version  can  be  found  in 
Liebowitz  and  Carson  (1985). 

INTRODUCTION 

This  paper  describes  the  Naval  Ocean  Systems  Center  (NOSC),  Code  911 
Digital  Dynamics  Processor  (DDP).  The  DDP  is  a  multiprocessor  system 
designed  for  high  speed  numerical  calculation.  This  ^tem  implements  a  new 
variation  of  conunon  memory  in  order  to  allow  a  far  greater  number  of  processors 
than  found  currently  on  a  TCDS.  Qj  f  ^ 

The  conceptual  design  of  the  DDP  was  done  in  mid  and  late  1984  by  Jack 
Zyphur.  A  variation  of  this  design  was  implemented  in  hardware  by  Mitchell 
(1989)  in  early  1985.  The  first  prototype  was  only  two  processor  slaves,  a 
repeater  slave,  and  a  master.  Since  the  original  goal  of  the  DDP  was  to  replace 
an  EAI  8800  Analog  computer,  the  first  attempt  at  software  for  the  DDP  was  a 
simulation  language  consisting  of  assembly  langUE^  n^ules  that  would  do  the 
same  functions  as  analog  components.  This  attempt  later  abandoned  in 
favor  of  direct  equation  implementation  using  a  X’'  cross-compiler  that  ran  on 
the  master.  In  early  1987  Murphy  programmed  in  a  quaternion  algorithm  and 
the  DDP  proved  operational  in  the  NOSC  Hybrid-Simulator.  The  DDP  has  since 
been  up  graded  to  six  processor  slaves  running  MC68030  CPUs  at  25  MHz.  ^  ^ 

HARDWARE 

The  DDP’s  maximum  configuration  is  16  cages  with  15  Processor  Slaves 
(PSs)  and  one  Repeater  Slave  (RS)  per  cage  and  one  Master  Processor  controlling 
the  system  (figure  1).  The  system  can  be  used  with  any  number  of  PSs  fi'om  0  to 
239.  The  RS  is  used  primarily  to  buffer  signals  between  the  master  and  the  mul¬ 
tiple  sets  of  cages  (figure  2). 
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Each  cage  has  two  32-bit  data  buses,  one  for  the  system  bus  (S-bus)  and 
one  for  the  variable  bus  (V-bus)  (figure  3).  The  PSs  share  common  data  with  each 
other  via  the  V-bus  using  a  PUT-TAKE  ^tem. 

The  PUT-TAKE  system  consists  of  one  12-bit  binary  counter  located  on 
the  meister,  and  a  Map  and  Variable  Memory  internal  to  each  PS  (figure  4).  The 
counter  provides  a  common  address  to  the  Map  Memories  on  all  the  PSs  simulta¬ 
neously  while  each  individual  Variable  Memory  gets  the  address  from  the  output 
of  the  Map  Memory  on  its  own  card.  When  the  system  is  initialized  the  master 
downloads  a  unique  binary  image  into  each  PS’s  Map  Memory.  When  the  PS 
starts  running,  the  counter  continually  loops  through  up  to  4096  unique  sets  of 
address  and  control  bits  stored  in  Map  Memory.  Based  on  the  state  of  the  PUT 
and  TAKE  bits  in  Map  Memory  at  the  current  counter  location,  the  PS  either 
puts  data  from  Variable  Memoiy  to  the  V-bus,  takes  data  from  the  V-bus  and 
stores  the  data  in  Variable  Memory,  or  remains  passive.  Normally  only  one  PS 
will  PUT  to  the  V-bus  at  a  time.  Since  PUT/TAKE  has  priority  in  accessing  data 
in  Variable  Memory,  the  PS  CPU  can  only  access  Variable  Memory  during  IDLE 
times  (neither  PUT  nor  TAKE).  This  type  of  virtusd  common  memory  allows  mul¬ 
tiple  PSs  to  get  a  current  copy  of  the  data  at  the  same  time  (several  TAKING  at 
once). 


The  PUT-TAKE  system  gives  each  PS  the  impression  it  has  16K  bytes 
(4096  X  32  bits)  of  memory  in  common  with  all  the  other  PSs  (previously  defined 
as  a  TCDS).  The  PUT-TAKE  system  is  implemented  in  hardware  so  there  is  no 
overhead  cost  to  the  PS’s  CPU  as  in  the  LCDS.  For  large  lookup  tables  there  is 
enough  Program  Memory  on  each  PS  so  it  can  store  its  own  copy  of  the  table  if 
need  be.  This  leaves  the  V-bus  solely  for  active  data  sharing.  The  V-bus  takes  300 
nsec  to  transfer  an  individual  32-bit  value  from  one  PS’s  Variable  Memory  to  any 
or  all  of  the  other  PSs’  Veiriable  Memories.  If  all  of  the  4096  variables  are  not 
used  more  critical  variables  can  be  repeated  in  Map  Memory  to  provide  a  faster 
update  rate.  The  data  latency  between  a  CPU’s  write  into  Variable  Memory  and 
another  CPU’s  read  of  the  new  value  is  approximately  1  jisec  to  1  msec.  The  ac¬ 
tual  time  depends  on  where  in  the  V-bus  counter  cycle  the  write 
occurs,  how  many  total  v£U'iables  are  used,  and  the  number  of  times  the  specific 
variable  is  repeated. 

To  handle  communications  with  externad  devices  each  PS  also  has  two 
16-bit  parallel  input  and  two  16-bit  parallel  output  ports  for  data  I/O  plus  a 
serial  port  for  console  I/O.  The  two  16-bit  input  and  output  ports  may  be  used  in 
tandem  to  provide  one  32-bit  input  parallel  port  and  one  32-bit  output  parallel 
port  if  desired. 


SOFTWARE 

The  DDP  PSs  have  no  built-in  operating  ^tem  and  are  controlled  by  the 
master.  This  leaves  all  the  PSs’  memoiy  available  for  program  and  data  and  al¬ 
lows  the  PSs’  CPU  to  do  the  one  thing  it  should  be  doing,  crunching  numbers. 
This  arrangement  does  place  a  little  more  responsibility  on  the  programmer. 

The  programmer’s  task  is  to  divide  up  the  problem  and  assign  each  PS  its 
portion.  This  job  needs  to  be  done  with  only  one  hardware  consideration  in 
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Figure  3.  NOSC  OOP  processor  slave  data  paths. 
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Figure  4.  NOSC  DDP  processor  slave  address  paths. 
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mind,  minimize  the  number  of  vMiables  on  the  V-bus.  By  doing  this  the  pro¬ 
grammer  is  minimizing  the  data  latency  and  makes  the  virtual  common  memory 
look  more  like  the  ideal  common  memory. 

There  is  a  library  of  subroutines  for  the  PSs  that  handles  the  low-level 
functions  and  some  tools  for  debugging.  This  library  has  routines  for  getting 
data  into  and  out  of  the  I/O  ports  (16-bit,  32-bit,  or  serial),  a  deadman  indicator 
(a  register  tickled  repeatedly  or  else  a  LED  lights  for  debugging),  a  programma¬ 
ble  periodic  interrupt  that  goes  to  all  PSs  (it  can  be  used  for  syncing  all  the  PS), 
and  a  special  version  of prirUf{ )  that  writes  to  the  PS’s  serial  port  (edso  handy  for 
debugging). 

The  master  has  utilities  for  clearing  all  the  PSs,  halting  all  the  PSs’ 

CPUs,  downloading  the  binary  images  into  Program  and  Map  Memoiy,  a  C  cross- 
compiler  for  programming  the  PSs,  and  a  compiler  that  generates  the  bineuy 
image  for  Map  Memory.  The  Map  Memory  compiler  reads  an  ASCII  file  in  which 
the  programmer  specifies  which  PSs  PUT  and  'TAKE  which  variables.  The  com¬ 
piler  checks  for  such  things  as  unique  names,  undefined  variables  (TAKEN  but 
not  PUT),  and  warns  when  more  than  one  PS  is  putting  at  a  time.  When  the 
compilation  is  complete  the  compiler  outputs  the  binary  Map  Memory  image  for 
each  PS,  a  header  file  for  each  PS  (for  inclusion  in  that  PS’s  C  program),  emd 
some  statistics  on  the  V-bus  loading.  The  header  file  for  each  PS  defines  the 
absolute  Variable  Memoiy  locations  for  that  PS’s  shared  data.  This  header  file  is 
a  helpful  debugging  aid  in  making  sure  each  PS  has  all  the  V-bus  variables 
defined. 

Once  a  problem  is  implemented,  the  setup  and  loading  of  the  DDP  is 
accomplished  by  executing  a  batch  job  on  the  master.  An  operator  can  take  over 
controlling  the  DDP,  freeing  the  programmer  to  do  other  assignments. 

IMPLEMENTATION 

In  the  introduction,  the  DDP  revision  1  proved  successful  by  implement¬ 
ing  a  solution  to  the  quaternion’s  parameters  in  the  NOSC  Hybrid-Simulator. 
Revision  2  handled  a  much  greater  problem:  the  real-time  solution  of  the  6-De¬ 
gree  of  Freedom  (6-DOF)  hydrodynamic  equavions  of  the  Mk  50  torpedo.  This  so¬ 
lution  included  the  quaternion  algorithm  previously  written.  This  problem,  to 
which  an  EAI 8800  Analog  computer  previously  had  been  dedicated  (the  analog 
did  not 

include  the  quaternions),  was  accomplished  in  six  PSs.  'The  problem  was  divided 
as  follows: 

PS  (1)  X  velocity  and  roll 

PS  (2)  Y  velocity  and  yaw 

PS  (3)  Z  velocity  and  pitch 

PS  (4)  Operator  control  interface  and  Simulator  I/O 

PS  (5)  Quaternion  states 

PS  (6)  Quaternion  to  Euler  conversion  and  Simulator  I/O 
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To  help  with  interface  into  the  hybrid-simulator,  an  analog  cage  was 
designed  that  contained  A/D  and  D/A  converters  so  that  analog  values  could  be 
read  in  and  placed  on  the  V-bus  or  data  from  the  V-bus  could  be  sent  out  as  ana¬ 
log  values. 

The  successful  demonstration  that  the  DDP  could  replace  an  analog  sys¬ 
tem  in  real-time  simulation  was  held  on  4  Jan  1989. 

Revision  3  of  the  DDP  hardware  is  being  setup  to  run  two  additional 
problems.  The  first  task  is  the  implementation  of  Mk  46  torpedo  6-DOF  equa¬ 
tions  with  quaternions,  an  autopliot,  and  engine  simulation.  The  second  task  is 
to  do  the  calculations  to  control  an  underwater  passive  acoustic  simulation  ^- 
tem. 
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